NLP-NITMZ @ MSIR 2016 System for Code-Mixed Cross-Script Question Classification
نویسندگان
چکیده
This paper describes our approach on Code–Mixed Cross– Script Question Classification task, which is a subtask 1 of MSIR 2016. MSIR is a Mixed Script Information Retrieval event in conjunction with FIRE 2016, which is the 8th meeting of Forum for Information Retrieval Evaluation. For this task, our team NLP–NITMZ submitted three system runs such as: i) using a direct feature set; ii) using direct and dependent feature set and iii) using Naive Bayes classifier. The first system is our baseline system, which is based direct feature sets and we used a group of keywords to generate this direct feature set. To identify question classes our baseline system falls in ambiguity (means one question is tagged with multiple classes). To deal with this ambiguity, we developed another set of feature and we consider this feature set as dependent feature set, because keywords from this set is worked with direct feature set. The highest accuracy of our system is 78.88% using method–2 and we submitted as run–3. Our other two runs have same accuracy as 74.44%.
منابع مشابه
Code Mixed Cross Script Question Classification
With the growth in our society, one of the most affected aspect of our routine life is language. We tend to mix our conversations in more than one language, often mixing up regional language with English language is a lot more common practice. This mixing of languages is referred as code mixing, where we mix different linguistic constituents such as phrases, proper nouns, morphemes etc. to come...
متن کاملAmrita-CEN@MSIR-FIRE2016: Code-Mixed Question Classification using BoWs and RNN Embeddings
Question classification is a key task in many question answering applications. Nearly all previous work on question classification has used machine learning and knowledge-based methods. This working note presents an embedding based Bag-ofWords method and Recurrent Neural Network to achieve an automatic question classification in the code-mixed BengaliEnglish text. We build two systems that clas...
متن کاملOverview of the Mixed Script Information Retrieval (MSIR) at FIRE-2016
The shared task on Mixed Script Information Retrieval (MSIR) was organized for the fourth year in FIRE-2016. The track had two subtasks. Subtask-1 was on question classification where questions were in code mixed Bengali-English and Bengali was written in transliterated Roman script. Subtask-2 was on ad-hoc retrieval of Hindi film song lyrics, movie reviews and astrology documents, where both t...
متن کاملModeling Classifier for Code Mixed Cross Script Questions
With a boom in the internet, the social media text had been increasing day by day and the user generated content (such as tweets and blogs) in Indian languages are written using Roman script due to various socio-cultural and technological reasons. A majority of these posts are multilingual in nature and many involve code mixing where lexical items and grammatical features from two languages app...
متن کاملEnsemble Classifier based approach for Code-Mixed Cross-Script Question Classification
With an increasing popularity of social-media, people post updates that aid other users in finding answers to their questions. Most of the user-generated data on social-media are in code-mixed or multi-script form, where the words are represented phonetically in a non-native script. We address the problem of Question-Classfication on social-media data. We propose an ensemble classifier based ap...
متن کامل